Improving the Transient Times for Distributed Stochastic Gradient Methods
نویسندگان
چکیده
We consider the distributed optimization problem where $n$ agents, each possessing a local cost function, collaboratively minimize average of functions over connected network. Assuming stochastic gradient information is available, we study algorithm, called exact diffusion with adaptive stepsizes (EDAS) adapted from Exact Diffusion method [1] and NIDS [2] perform non-asymptotic convergence analysis. not only show that EDAS asymptotically achieves same network independent rate as centralized descent (SGD) for minimizing strongly convex smooth objective functions, but also characterize transient time needed algorithm to approach asymptotic rate, which behaves notation="LaTeX">$K_{T}=\mathcal {O}(\frac{n}{1-\lambda _{2}})$, notation="LaTeX">$1-\lambda _{2}$ stands spectral gap mixing matrix. To best our knowledge, shortest when function smooth. Numerical simulations further corroborate strengthen obtained theoretical results.
منابع مشابه
Distributed Stochastic Gradient MCMC
Probabilistic inference on a big data scale is becoming increasingly relevant to both the machine learning and statistics communities. Here we introduce the first fully distributed MCMC algorithm based on stochastic gradients. We argue that stochastic gradient MCMC algorithms are particularly suited for distributed inference because individual chains can draw mini-batches from their local pool ...
متن کاملVariance Reduction for Distributed Stochastic Gradient Descent
Variance reduction (VR) methods boost the performance of stochastic gradient descent (SGD) by enabling the use of larger, constant stepsizes and preserving linear convergence rates. However, current variance reduced SGD methods require either high memory usage or an exact gradient computation (using the entire dataset) at the end of each epoch. This limits the use of VR methods in practical dis...
متن کاملWithout-Replacement Sampling for Stochastic Gradient Methods
Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled with replacement. In contrast, sampling without replacement is far less understood, yet in practice it is very common, often easier to implement, and usually performs better. In this paper, we provide competitive convergence guarantees for without-replacement sampling...
متن کاملSemi-Stochastic Gradient Descent Methods
In this paper we study the problem of minimizing the average of a large number (n) of smooth convex loss functions. We propose a new method, S2GD (Semi-Stochastic Gradient Descent), which runs for one or several epochs in each of which a single full gradient and a random number of stochastic gradients is computed, following a geometric law. The total work needed for the method to output an ε-ac...
متن کاملTowards Stochastic Conjugate Gradient Methods
The method of conjugate gradients provides a very effective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient. Here we explore a number of ways to adopt ideas from conjugate gradient in the stochastic setting, using fast Hessian-vector products to obtain curvature information cheaply. I...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Automatic Control
سال: 2022
ISSN: ['0018-9286', '1558-2523', '2334-3303']
DOI: https://doi.org/10.1109/tac.2022.3201141